智能论文笔记

Fast Hierarchical Deep Unfolding Network for Image Compressed Sensing

Wenxue Cui , Shaohui Liu , Debin Zhao

分类：计算机视觉

2022-08-03

通过将某些优化求解器与深神经网络相结合，深层展开网络（DUN）近年来引起了图像压缩感（CS）的广泛关注。但是，现有DUN中仍然存在几个问题：1）对于每次迭代，通常采用一个简单的堆叠卷积网络，这显然限制了这些模型的表现力。 2）培训完成后，对于任何输入内容，大多数现有DUNS的超参数均已固定，这大大削弱了其适应性。在本文中，通过展开快速迭代的收缩阈值算法（FISTA），提出了一种新颖的快速分层dun，被称为Fhdun，用于图像压缩传感，开发出了精心设计的层次结构，以合作探索富人的上下文，以探索富人的上下文。多尺度空间中的信息。为了进一步增强适应性，在我们的框架中开发了一系列的超参数生成网络，以根据输入内容动态生产相应的最佳超参数。此外，由于Fista的加速政策，新嵌入的加速模块使拟议的Fhdun节省了超过50％的迭代循环，以抵抗最近的Duns。广泛的CS实验表明，所提出的FHDUN优于现有的最新CS方法，同时保持较少的迭代。

translated by 谷歌翻译

Image Compressed Sensing Using Non-local Neural Network

Wenxue Cui , Shaohui Liu , Feng Jiang , Debin Zhao

分类：计算机视觉

2021-12-07

基于深度网络的图像压缩感（CS）近年来引起了很多关注。然而，现有的基于深网络的CS方案以逐个块的方式重建目标图像，其导致严重的块伪像或将深网络训练为黑盒，其带来了对图像先验知识的有限识别。本文提出了一种使用非局部神经网络（NL-CSNet）的新型图像CS框架，其利用具有深度网络的非本地自相似子，提高重建质量。在所提出的NL-CSNET中，构造了两个非本地子网，用于分别利用测量域中的非本地自相似子系统和多尺度特征域。具体地，在测量域的子网中，建立用于更好的初始重建的不同图像块的测量之间的长距离依赖性。类似地，在多尺度特征域的子网中，在深度重建的多尺度空间中探讨了密集特征表示之间的亲和力。此外，开发了一种新的损失函数以增强非本地表示之间的耦合，这也能够实现NL-CSNet的端到端训练。广泛的实验表明，NL-CSNet优于现有的最先进的CS方法，同时保持快速的计算速度。

translated by 谷歌翻译

Understanding Imbalanced Semantic Segmentation Through Neural Collapse

Zhisheng Zhong , Jiequan Cui , Yibo Yang , Xiaoyang Wu , Xiaojuan Qi , Xiangyu Zhang , Jiaya Jia

分类：计算机视觉 | 机器学习

2023-01-03

A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.

translated by 谷歌翻译

Benchmarking the Robustness of LiDAR Semantic Segmentation Models

Xu Yan , Chaoda Zheng , Zhen Li , Shuguang Cui , Dengxin Dai

分类：计算机视觉

2023-01-03

When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.

translated by 谷歌翻译

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

Zehua Chen , Yihan Wu , Yichong Leng , Jiawei Chen , Haohe Liu , Xu Tan , Yang Cui , Ke Wang , Lei He , Sheng Zhao

分类：自然语言处理 | 机器学习

2022-12-30

Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of inference steps but at the cost of sample quality. In this work, to improve the inference speed for DDPM-based TTS model while achieving high sample quality, we propose ResGrad, a lightweight diffusion model which learns to refine the output spectrogram of an existing TTS model (e.g., FastSpeech 2) by predicting the residual between the model output and the corresponding ground-truth speech. ResGrad has several advantages: 1) Compare with other acceleration methods for DDPM which need to synthesize speech from scratch, ResGrad reduces the complexity of task by changing the generation target from ground-truth mel-spectrogram to the residual, resulting into a more lightweight model and thus a smaller real-time factor. 2) ResGrad is employed in the inference process of the existing TTS model in a plug-and-play way, without re-training this model. We verify ResGrad on the single-speaker dataset LJSpeech and two more challenging datasets with multiple speakers (LibriTTS) and high sampling rate (VCTK). Experimental results show that in comparison with other speed-up methods of DDPMs: 1) ResGrad achieves better sample quality with the same inference speed measured by real-time factor; 2) with similar speech quality, ResGrad synthesizes speech faster than baseline methods by more than 10 times. Audio samples are available at https://resgrad1.github.io/.

translated by 谷歌翻译

Towards AI-Empowered Crowdsourcing

Shipeng Wang , Qingzhong Li , Lizhen Cui , Zhongmin Yan , Yonghui Xu , Zhuan Shi , Zhiqi Shen , Han Yu

分类：人工智能

2022-12-28

Crowdsourcing, in which human intelligence and productivity is dynamically mobilized to tackle tasks too complex for automation alone to handle, has grown to be an important research topic and inspired new businesses (e.g., Uber, Airbnb). Over the years, crowdsourcing has morphed from providing a platform where workers and tasks can be matched up manually into one which leverages data-driven algorithmic management approaches powered by artificial intelligence (AI) to achieve increasingly sophisticated optimization objectives. In this paper, we provide a survey presenting a unique systematic overview on how AI can empower crowdsourcing - which we refer to as AI-Empowered Crowdsourcing(AIEC). We propose a taxonomy which divides algorithmic crowdsourcing into three major areas: 1) task delegation, 2) motivating workers, and 3) quality control, focusing on the major objectives which need to be accomplished. We discuss the limitations and insights, and curate the challenges of doing research in each of these areas to highlight promising future research directions.

translated by 谷歌翻译

Shape-Aware Fine-Grained Classification of Erythroid Cells

Ye Wang , Rui Ma , Xiaoqing Ma , Honghua Cui , Yubin Xiao , Xuan Wu , You Zhou

分类：计算机视觉

2022-12-28

Fine-grained classification and counting of bone marrow erythroid cells are vital for evaluating the health status and formulating therapeutic schedules for leukemia or hematopathy. Due to the subtle visual differences between different types of erythroid cells, it is challenging to apply existing image-based deep learning models for fine-grained erythroid cell classification. Moreover, there is no large open-source datasets on erythroid cells to support the model training. In this paper, we introduce BMEC (Bone Morrow Erythroid Cells), the first large fine-grained image dataset of erythroid cells, to facilitate more deep learning research on erythroid cells. BMEC contains 5,666 images of individual erythroid cells, each of which is extracted from the bone marrow erythroid cell smears and professionally annotated to one of the four types of erythroid cells. To distinguish the erythroid cells, one key indicator is the cell shape which is closely related to the cell growth and maturation. Therefore, we design a novel shape-aware image classification network for fine-grained erythroid cell classification. The shape feature is extracted from the shape mask image and aggregated to the raw image feature with a shape attention module. With the shape-attended image feature, our network achieved superior classification performance (81.12\% top-1 accuracy) on the BMEC dataset comparing to the baseline methods. Ablation studies also demonstrate the effectiveness of incorporating the shape information for the fine-grained cell classification. To further verify the generalizability of our method, we tested our network on two additional public white blood cells (WBC) datasets and the results show our shape-aware method can generally outperform recent state-of-the-art works on classifying the WBC. The code and BMEC dataset can be found on https://github.com/wangye8899/BMEC.

translated by 谷歌翻译

NEEDED: Introducing Hierarchical Transformer to Eye Diseases Diagnosis

Xu Ye , Meng Xiao , Zhiyuan Ning , Weiwei Dai , Wenjuan Cui , Yi Du , Yuanchun Zhou

分类：自然语言处理

2022-12-27

With the development of natural language processing techniques(NLP), automatic diagnosis of eye diseases using ophthalmology electronic medical records (OEMR) has become possible. It aims to evaluate the condition of both eyes of a patient respectively, and we formulate it as a particular multi-label classification task in this paper. Although there are a few related studies in other diseases, automatic diagnosis of eye diseases exhibits unique characteristics. First, descriptions of both eyes are mixed up in OEMR documents, with both free text and templated asymptomatic descriptions, resulting in sparsity and clutter of information. Second, OEMR documents contain multiple parts of descriptions and have long document lengths. Third, it is critical to provide explainability to the disease diagnosis model. To overcome those challenges, we present an effective automatic eye disease diagnosis framework, NEEDED. In this framework, a preprocessing module is integrated to improve the density and quality of information. Then, we design a hierarchical transformer structure for learning the contextualized representations of each sentence in the OEMR document. For the diagnosis part, we propose an attention-based predictor that enables traceable diagnosis by obtaining disease-specific information. Experiments on the real dataset and comparison with several baseline models show the advantage and explainability of our framework.

translated by 谷歌翻译

A Survey on Federated Recommendation Systems

Zehua Sun , Yonghui Xu , Yong Liu , Wei He , Yali Jiang , Fangzhao Wu , Lizhen Cui

分类：人工智能 | 机器学习

2022-12-27

Federated learning has recently been applied to recommendation systems to protect user privacy. In federated learning settings, recommendation systems can train recommendation models only collecting the intermediate parameters instead of the real user data, which greatly enhances the user privacy. Beside, federated recommendation systems enable to collaborate with other data platforms to improve recommended model performance while meeting the regulation and privacy constraints. However, federated recommendation systems faces many new challenges such as privacy, security, heterogeneity and communication costs. While significant research has been conducted in these areas, gaps in the surveying literature still exist. In this survey, we-(1) summarize some common privacy mechanisms used in federated recommendation systems and discuss the advantages and limitations of each mechanism; (2) review some robust aggregation strategies and several novel attacks against security; (3) summarize some approaches to address heterogeneity and communication costs problems; (4)introduce some open source platforms that can be used to build federated recommendation systems; (5) present some prospective research directions in the future. This survey can guide researchers and practitioners understand the research progress in these areas.

translated by 谷歌翻译

Deep Cost-sensitive Learning for Wheat Frost Detection

Shujian Cao , Lin Cui , Haipeng Liu

分类：计算机视觉

2022-12-25

Frost damage is one of the main factors leading to wheat yield reduction. Therefore, the detection of wheat frost accurately and efficiently is beneficial for growers to take corresponding measures in time to reduce economic loss. To detect the wheat frost, in this paper we create a hyperspectral wheat frost data set by collecting the data characterized by temperature, wheat yield, and hyperspectral information provided by the handheld hyperspectral spectrometer. However, due to the imbalance of data, that is, the number of healthy samples is much higher than the number of frost damage samples, a deep learning algorithm tends to predict biasedly towards the healthy samples resulting in model overfitting of the healthy samples. Therefore, we propose a method based on deep cost-sensitive learning, which uses a one-dimensional convolutional neural network as the basic framework and incorporates cost-sensitive learning with fixed factors and adjustment factors into the loss function to train the network. Meanwhile, the accuracy and score are used as evaluation metrics. Experimental results show that the detection accuracy and the score reached 0.943 and 0.623 respectively, this demonstration shows that this method not only ensures the overall accuracy but also effectively improves the detection rate of frost samples.

translated by 谷歌翻译